Find the duplicate number [Binary Search, Two Pointers]

Time: O(N); Space: O(1); hard

Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

Example 1:

Input: nums = [1,3,4,2,2]

Output: 2

Example 2:

Input: nums = [3,1,3,4,2]

Output: 3

Notes:

  • You must not modify the array (assume the array is read only).

  • You must use only constant, O(1) extra space.

  • Your runtime complexity should be less than O(n2).

  • There is only one duplicate number in the array, but it could be repeated more than once.

1. Floyd’s Tortoise and Hare (Cycle Detection) [O(N), O(1)]

Intuition

The idea is to reduce the problem to Linked List Cycle II:

Given a linked list, return the node where the cycle begins.

First of all, where does the cycle come from?

Let’s use the function f(x) = nums[x] to construct the sequence: x, nums[x], nums[nums[x]], nums[nums[nums[x]]], ….

Each new element in the sequence is an element in nums at the index of the previous element.

If one starts from x = nums[0], such a sequence will produce a linked list with a cycle.

The cycle appears because nums contains duplicates. The duplicate node is a cycle entrance.

Here is how it works:

The example above is simple because the loop is small. Here is a more interesting example

Now the problem is to find the entrance of the cycle.

Algorithm

Floyd’s algorithm consists of two phases and uses two pointers, usually called tortoise and hare.

  1. In phase 1, hare = nums[nums[hare]] is twice as fast as tortoise = nums[tortoise].

Since the hare goes fast, it would be the first one who enters the cycle and starts to run around the cycle. At some point, the tortoise enters the cycle as well, and since it’s moving slower the hare catches the tortoise up at some intersection point. Now phase 1 is over, and the tortoise has lost.

Note that the intersection point is not the cycle entrance in the general case.

To compute the intersection point, let’s note that the hare has traversed twice as many nodes as the tortoise, i.e. 2d(tortoise)=d(hare), that means

2(F + a) = F + nC + a, where n is some integer.

Hence the coordinate of the intersection point is F + a = nC.

  1. In phase 2, we give the tortoise a second chance by slowing down the hare, so that it now moves with the speed of tortoise: tortoise = nums[tortoise], hare = nums[hare]. The tortoise is back at the starting position, and the hare starts from the intersection point.

Let’s show that this time they meet at the cycle entrance after FF steps.

  • The tortoise started from zero, so its position after FF steps is FF.

  • The hare started at the intersection point F + a = nCF+a=nC, so its position after F steps is nC + FnC+F, that is the same point as FF.

  • So the tortoise and the (slowed down) hare will meet at the entrance of the cycle.

See also: https://leetcode.com/problems/find-the-duplicate-number/solution/

[5]:
class Solution1(object):
    """
    Time: O(N)
    Space: O(1)
    """
    def findDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        # Treat each (key, value) pair of the array as the (pointer, next) node of the linked list,
        # thus the duplicated number will be the begin of the cycle in the linked list.
        # Besides, there is always a cycle in the linked list which
        # starts from the first element of the array.
        slow = nums[0]
        fast = nums[nums[0]]

        while slow != fast:
            slow = nums[slow]
            fast = nums[nums[fast]]

        fast = 0
        while slow != fast:
            slow = nums[slow]
            fast = nums[fast]

        return slow
[6]:
s = Solution1()

nums = [1,3,4,2,2]
assert s.findDuplicate(nums) == 2

nums = [3,1,3,4,2]
assert s.findDuplicate(nums) == 3

2. Binary search method

[7]:

class Solution2(object):
    def findDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        left, right = 1, len(nums) - 1

        while left <= right:
            mid = left + (right - left) // 2
            # Get count of num <= mid.
            count = 0
            for num in nums:
                if num <= mid:
                    count += 1
            if count > mid:
                right = mid - 1
            else:
                left = mid + 1
        return left
[8]:
s = Solution2()

nums = [1,3,4,2,2]
assert s.findDuplicate(nums) == 2

nums = [3,1,3,4,2]
assert s.findDuplicate(nums) == 3

3. Mark visited values

[9]:
class Solution3(object):
    def findDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        duplicate = 0
        # Mark the value as visited by negative.
        for num in nums:
            if nums[abs(num) - 1] > 0:
                nums[abs(num) - 1] *= -1
            else:
                duplicate = abs(num)
                break
        # Rollback the value.
        for num in nums:
            if nums[abs(num) - 1] < 0:
                nums[abs(num) - 1] *= -1
            else:
                break
        return duplicate
[10]:
s = Solution3()

nums = [1,3,4,2,2]
assert s.findDuplicate(nums) == 2

nums = [3,1,3,4,2]
assert s.findDuplicate(nums) == 3

4. Using Sort [O(NLgN), O(1) or O(n)]

Intuition

If the numbers are sorted, then any duplicate numbers will be adjacent in the sorted array.

Algorithm

Given the intuition, the algorithm follows fairly simply. First, we sort the array, and then we compare each element to the previous element. Because there is exactly one duplicated element in the array, we know that the array is of at least length 2, and we can return the duplicate element as soon as we find it.

[11]:
class Solution4(object):
    def findDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        nums.sort()
        for i in range(1, len(nums)):
            if nums[i] == nums[i-1]:
                return nums[i]
[12]:
s = Solution4()

nums = [1,3,4,2,2]
assert s.findDuplicate(nums) == 2

nums = [3,1,3,4,2]
assert s.findDuplicate(nums) == 3

5. Using Set [O(N), O(N)]

Intuition

If we store each element as we iterate over the array, we can simply check each element as we iterate over the array.

Algorithm

In order to achieve linear time complexity, we need to be able to insert elements into a data structure (and look them up) in constant time. A Set satisfies these constraints nicely, so we iterate over the array and insert each element into seen. Before inserting it, we check whether it is already there. If it is, then we found our duplicate, so we return it.

[13]:
class Solution5(object):
    def findDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        seen = set()
        for num in nums:
            if num in seen:
                return num
            seen.add(num)
[14]:
s = Solution5()

nums = [1,3,4,2,2]
assert s.findDuplicate(nums) == 2

nums = [3,1,3,4,2]
assert s.findDuplicate(nums) == 3

6. Swap

Iteratively put nums[0] to the correct place so that the value equal to the index;

Then 1 to n will be put at index 1 to index n, except for nums[0], which is the answer.

[15]:
class Solution6(object):
    def findDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        while nums[nums[0]] != nums[0]:
            nums[nums[0]], nums[0] = nums[0], nums[nums[0]]
        return nums[0]
[16]:
s = Solution6()

nums = [1,3,4,2,2]
assert s.findDuplicate(nums) == 2

nums = [3,1,3,4,2]
assert s.findDuplicate(nums) == 3